Exploratory data analysis of interval-valued symbolic data with matrix visualization

نویسندگان

  • Chiun-How Kao
  • Junji Nakano
  • Sheau-Hue Shieh
  • Yin-Jing Tien
  • Han-Ming Wu
  • Chuan-kai Yang
  • Chun-Houh Chen
چکیده

Symbolic data analysis (SDA) has gained popularity over the past few years because of its potential for handling data having a dependent and hierarchical nature. Amongst many methods for analyzing SDA data, exploratory data analysis (EDA: Tukey, (1977)) with graphical presentation is an important one. Recent developments of graphical and visualization tools for SDA include zoom star, closed shapes, and parallel-coordinate-plots. Other studies project high dimensional SDA data into lower dimensional space using SDA versions of principal component analysis, multidimensional scaling, and self-organizing maps. Most graphical and visualization approaches for exploring SDA data structure inherit the advantages of their counterparts for conventional (non-SDA) data, but also their disadvantages. Here we introduce matrix visualization (MV) for visualizing and clustering SDA data using interval-valued symbolic data as an example; it is by far the most popular SDA data type in the literature and the most commonly encountered one in practice. Many MV techniques for visualizing and clustering conventional data are converted to SDA data, and several techniques are newly developed for SDA data. Various examples of data with simple to complex structures are brought in to illustrate the proposed methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Symbolic Covariance Matrix for Interval-valued Variables and its Application to Principal Component Analysis: a Case Study

In the last two decades, principal component analysis (PCA) was extended to interval-valued data; several adaptations of the classical approach are known from the literature. Our approach is based on the symbolic covariance matrix Cov for the interval-valued variables proposed by Billard (2008). Its crucial advantage, when compared to other approaches, is that it fully utilizes all the informat...

متن کامل

Symbolic Principal Components for Interval-valued Observations

One feature of contemporary datasets is that instead of the single point value in the p-dimensional space < seen in classical data, the data may take interval values thus producing hypercubes in <. This paper extends the methodology of classical principal components to that for interval-valued data. Two methods are proposed, viz., a vertices method which uses all the vertices of the observation...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

(T,S)-BASED INTERVAL-VALUED INTUITIONISTIC FUZZY COMPOSITION MATRIX AND ITS APPLICATION FOR CLUSTERING

In this paper, the notions of $(T,S)$-composition matrix and$(T,S)$-interval-valued intuitionistic fuzzy equivalence matrix areintroduced where $(T,S)$ is a dual pair of triangular module. Theyare the generalization of composition matrix and interval-valuedintuitionistic fuzzy equivalence matrix. Furthermore, theirproperties and characterizations are presented. Then a new methodbased on $tilde{...

متن کامل

Interval-Valued Hesitant Fuzzy Method based on Group Decision Analysis for Estimating Weights of Decision Makers

In this paper, a new soft computing group decision method based on the concept of compromise ratio is introduced for determining decision makers (DMs)' weights through the group decision process under uncertainty. In this method, preferences and judgments of the DMs or experts are expressed by linguistic terms for rating the industrial alternatives among selected criteria as well as the relativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 79  شماره 

صفحات  -

تاریخ انتشار 2014